Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version

Merge Attributes (Operator Toolbox)

Synopsis

This operator merges two or more ExampleSets into one by appending all attributes into one ExampleSet row-by-row. The first row is merged with the first row of the other ExampleSets, the second with the second and so on. If the size of the input ExampleSet differs, the resulting ExampleSet will insert missing values for all attributes originating from the smaller ExampleSets.

Input

  • example set (Data Table)

    This operator can have multiple ExampleSet inputs. When one ExampleSet is connected, another input port becomes available which is ready to accept another ExampleSet. The order of inputs remains the same. Collections of ExampleSets can also be connected.

Output

  • merged set (Data Table)

    The merged ExampleSet.

Parameters

  • handling_of_duplicate_attributes This parameter defines how equally named attributes are treated.
    • rename: All attributes with the same name will be renamed to attributeName_ExampleSetNumber.
    • keep_only_first: For attributes with the same name, only the first one is kept in the resulting merged ExampleSet. Attributes of ExampleSets with the same name, or of ExampleSets inside the collection(s) having the same name, are ignored. This is irrespective of the values of these attributes.
    Range:
  • handling_of_special_attributes This parameter defines how special attributes are treated during. The role of each special attribute can only occur once in the resulting ExampleSet.
    • keep_first_special_other_regular: The first attribute with a special role keeps its role. All other attributes with this role will be changed to regular attributes.
    • keep_only_first: Only the first attribute with a special role is kept. All attributes with the same special role in other ExampleSets are ignored.
    • change_all_special_to_regular: All special attributes are changed to regular ones.
    Range:
  • handling_of_duplicate_annotations This parameter defines how equally named annotations are treated.
    • rename: All annotations with the same name will be renamed to annotationName_ExampleSetNumber.
    • keep_only_first: For annotations with the same name, only the first one is kept in the resulting merged ExampleSet. Among other inputs, annotations of ExampleSets with the same name or of ExampleSets inside the collection(s) having the same name are ignored.
    Range:

Tutorial Processes

Simple Merge of two ExampleSets

This tutorial process is an example of a simple merge of two ExampleSets. The operators Generate Direct Mailing Data and Generate Churn Data are used to generate test data which is, for demonstration purposes, assumed to contain information about the same examples. The Merge operator is used to merge the ExampleSets together by appending the attributes of the second ExampleSet to the first ExampleSet without a join operation.

Using Merge for a complex preprocessing

This tutorial process demonstrates the usage of Merge for more complex preprocessing. The Generate Massive Data operator is used to generate a large ExampleSet with 3000 examples and 500 attributes. Within a Loop Attributes operator, a preprocessing step is performed with some specific applications for every attribute. This results in each iteration generating an ExampleSet with only one attribute. Thus, the Loop Attributes operator delivers a collection of 500 ExampleSets - each containing a single attribute. The Merge operator merges these 500 ExampleSets again to one ExampleSet with 500 attributes.